Goto

Collaborating Authors

 image size



Appendix of " Hierarchical Vector Quantized Transformer for Multi-class Unsupervised Anomaly Detection "

Neural Information Processing Systems

The hyperparameters β and α are set as 0.5 and 0.01 for each layer. CIF AR-10: The image size is set to 224 x 224, and the feature size is 14 x 14. The encoder and decoder layers were both set to 4. The hyperparameters β and α are set to 0.5 and 0.01 for each layer. ELBO of our variational autoencoder should include both a reconstruction likelihood and a KL term. Lower Bound (ELBO) is constant, w.r.t. the KL divergence can thus be ignored for training.



Another BRIXEL in the Wall: Towards Cheaper Dense Features

Lappe, Alexander, Giese, Martin A.

arXiv.org Artificial Intelligence

Vision foundation models achieve strong performance on both global and locally dense downstream tasks. Pretrained on large images, the recent DINOv3 model family is able to produce very fine-grained dense feature maps, enabling state-of-the-art performance. However, computing these feature maps requires the input image to be available at very high resolution, as well as large amounts of compute due to the squared complexity of the transformer architecture. T o address these issues, we propose BRIXEL, a simple knowledge distillation approach that has the student learn to reproduce its own feature maps at higher resolution. Despite its simplicity, BRIXEL outperforms the baseline DINOv3 models by large margins on downstream tasks when the resolution is kept fixed. Moreover, it is able to produce feature maps that are very similar to those of the teacher at a fraction of the computational cost.




Improve bounding box in Carla Simulator

Chaar, Mohamad Mofeed, Raiyn, Jamal, Weidl, Galia

arXiv.org Artificial Intelligence

The CARLA simulator (Car Learning to Act) serves as a robust platform for testing algorithms and generating datasets in the field of Autonomous Driving (AD). It provides control over various environmental parameters, enabling thorough evaluation. Development bounding boxes are commonly utilized tools in deep learning and play a crucial role in AD applications. The predominant method for data generation in the CARLA Simulator involves identifying and delineating objects of interest, such as vehicles, using bounding boxes. The operation in CARLA entails capturing the coordinates of all objects on the map, which are subsequently aligned with the sensor's coordinate system at the ego vehicle and then enclosed within bounding boxes relative to the ego vehicle's perspective. However, this primary approach encounters challenges associated with object detection and bounding box annotation, such as ghost boxes. Although these procedures are generally effective at detecting vehicles and other objects within their direct line of sight, they may also produce false positives by identifying objects that are obscured by obstructions. We have enhanced the primary approach with the objective of filtering out unwanted boxes. Performance analysis indicates that the improved approach has achieved high accuracy.




Performance comparison of medical image classification systems using TensorFlow Keras, PyTorch, and JAX

Bećirović, Merjem, Kurtović, Amina, Smajlović, Nordin, Kapo, Medina, Akagić, Amila

arXiv.org Artificial Intelligence

Medical imaging plays a vital role in early disease diagnosis and monitoring. Specifically, blood microscopy offers valuable insights into blood cell morphology and the detection of hematological disorders. In recent years, deep learning-based automated classification systems have demonstrated high potential in enhancing the accuracy and efficiency of blood image analysis. However, a detailed performance analysis of specific deep learning frameworks appears to be lacking. This paper compares the performance of three popular deep learning frameworks, TensorFlow with Keras, PyTorch, and JAX, in classifying blood cell images from the publicly available BloodMNIST dataset. The study primarily focuses on inference time differences, but also classification performance for different image sizes. The results reveal variations in performance across frameworks, influenced by factors such as image resolution and framework-specific optimizations. Classification accuracy for JAX and PyTorch was comparable to current benchmarks, showcasing the efficiency of these frameworks for medical image classification.